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Abstract 

Consider estimating an unknown, but structured (e.g. sparse, low-rank, etc.), signal Xo € M” from a vector 
y € R m of measurements of the form yi = <^(ai T Xo), where the a*’s are the rows of a known measurement 
matrix A, and, <j( ) is a (potentially unknown) nonlinear and random link-function. Such measurement functions 
could arise in applications where the measurement device has nonlinearities and uncertainties. It could also arise by 
design, e.g., gi(x) = sign(:r + Zi), corresponds to noisy l-bit quantized measurements. Motivated by the classical 
work of Brillinger, and more recent work of Plan and Vershynin, we estimate Xo via solving the Generalized- 
LASSO, i.e., x := argmin x ||y — Axo||2 + A/(x) for some regularization parameter A > 0 and some (typically 
non-smooth) convex regularizer /(•) that promotes the structure of xo, e.g. t\ -norm, nuclear-norm, etc. While this 
approach seems to naively ignore the nonlinear function g(-), both Brillinger (in the non-constrained case) and Plan 
and Vershynin have shown that, when the entries of A are iid standard normal, this is a good estimator of xo up to 
a constant of proportionality //, which only depends on g(-). In this work, we considerably strengthen these results 
by obtaining explicit expressions for||x— /rx ( |112, for the regularized Generalized-LASSO, that are asymptotically 
precise when m and n grow large. A main result is that the estimation performance of the Generalized LASSO with 
non-linear measurements is asymptotically the same as one whose measurements are linear yi = p&i T xo + uzu 
with p = E7<7(7) and a 2 = E(g(7) — py) 2 , and, 7 standard normal. To the best of our knowledge, the derived 
expressions on the estimation performance are the first-known precise results in this context. One interesting 
consequence of our result is that the optimal quantizer of the measurements that minimizes the estimation error 
of the Generalized LASSO is the celebrated Lloyd-Max quantizer. 


I. Introduction 

A. Problem Setup 

1) Non-linear Measurements: Consider the problem of estimating an unknown signal vector xo G M n from a 
vector y = (yi,y 2 , ■ ■ ■, y m ) T of m measurements taking the following form: 

Vi = 5i(afx 0 ), i = 1,2 ,... ,m. (1) 

Here, each aj represents a (known) measurement vector. The g t ’s are independent copies of a (generically random) 
link function g. For instance, gi{x) = x + Zi, with say z t being normally distributed, recovers the standard linear 
regression setup with gaussian noise. In this paper, we are particularly interested in scenarios where g is non-linear. 
Notable examples include g(x) = sign(a:) (or gi{x) = sign(x + Zi )) and g(x) = (x)+, corresponding to 1-bit 
quantized (noisy) measurements, and, to the censored Tobit model, respectively. Depending on the situation, g 
might be known or unspecified. In the statistics and econometrics literature, the measurement model in (1) is 
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popular under the name single-index model and several aspects of it have been well-studied, e.g. [Bri82], [Bri77], 
[Ich93], [LD89]. 

2) Structured Signals: It is typical in many instances that the unknown signal xo obeys some sort of structure. 
For instance, it might be sparse in which case only a few k <C n, of its entries are non-zero; or, it might be 
that xo = vec(Xo), where Xo € Rv™ x \/F is a matrix of low-rank r < n. To exploit this information it is 
typical to associate with the structure of xo a properly chosen function / : R n — >■ R, which we refer to as the 
regularizer. Of particular interest are convex and non-smooth such regularizes, e.g. the G-norm for sparse signals, 
the nuclear-norm for low-rank ones, etc. Please refer for example to [CRPW12], [BaclO], [HC14], [ALMT13] 
for further discussions. 

3) An Algorithm for Linear Measurements: The Generalized LASSO: When the link function is linear, i.e. 
gfx ) = x + z t , perhaps the most popular way of estimating xo is via solving the Generalized LASSO algorithm: 

x := argmin ||y — Ax ||2 + A/(x). (2) 

X 

Here, A = [ai,a 2 ,...,a m ] T 6 R mxn is the known measurement matrix and A > 0 is a regularizer parameter. 
This is often referred to as the f^-LASSO or the square-root-LASSO [BCW11] to distinguish from the one which 
solves min x ^||y — Ax||| + A/(x), instead. The results of this paper can be accustomed to this latter version, but 
for concreteness, we restrict attention to (2) throughout. The acronym LASSO for (2) was introduced in [Tib96] 
for the special case of i\ -regularization; ( 2 ) is a natural generalization to other kinds of structures and includes 
the group-LASSO [YL06], the fused-LASSO [TSR + 05] as special cases. We often drop the term “Generalized” 
and refer to (2) simply as the LASSO. 

One popular, measure of estimation performance of (2) is the squared-error ||x —xo|||. Recently, there have been 
significant advances on establishing tight bounds and even precise characterizations of this quantity, in the presence 
of linear measurements [DMM11], [BM12], [Stol3], [OTH13], [TPH15], [TOH], Such precise results have been 
core to building a better understanding of the behavior of the LASSO, and, in particular, on the exact role played 
by the choice of the regularizer / (in accordance with the structure of xo), by the number of measurements m, 
by the value of A, etc.. In certain cases, they even provide us with useful insights into practical matters such as 
the tuning of the regularizer parameter. 

4) Using the LASSO for Non-linear Measurements?: The LASSO is by nature tailored to a linear model for 
the measurements. Indeed, the first term of the objective function in (2) tries to fit Ax to the observed vector 
y presuming that this is of the form y,; = a/ xo + noise. Of course, no one stops us from continuing to use it 
even in cases where y,; = yfa'/xo) with y being won-linear 1 . But, the question then becomes: Can there be any 
guarantees that the solution x of the Generalized LASSO is still a good estimate of xo? 

The question just posed was first studied back in the early 80’s by Brillinger [Bri82] who provided answers in 
the case of solving (2) without a regularizer term. This, of course, corresponds to standard Least Squares (LS). 
Interestingly, he showed that when the measurement vectors are Gaussian, then the LS solution is a consistent 
estimate of xo, up to a constant of proportionality //, which only depends on the link-function y. The result is 
sharp, but only under the assumption that the number of measurements m grows large, while the signal dimension 

'Note that the Generalized LASSO in (2) does not assume knowledge of g. All that is assumed is the availability of the measurements 
yi. Thus, the link-function might as well be unknown or unspecified. 
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n stays fixed, which was the typical setting of interest at the time. In the world of structured signals and high¬ 
dimensional measurements, the problem was only very recently revisited by Plan and Vershynin [PV15]. They 
consider a constrained version of the Generalized LASSO, in which the regularizer is essentially replaced by a 
constraint, and derive upper bounds on its performance. The bounds are not tight (they involve absolute constants), 
but they demonstrate some key features: i) the solution to the constrained LASSO x is a good estimate of xo 
up to the same constant of proportionality p that appears in Brillinger’s result, ii) Thus, ||x — pxoHl is a natural 
measure of performance, iii) Estimation is possible even with m < n measurements by taking advantage of the 
structure of xo. 

B. Summary of Contributions 

Inspired by the work of Plan and Vershynin [PV15], and, motivated by recent advances on the precise analysis 
of the Generalized LASSO with linear measurements, this paper extends these latter results to the case of non¬ 
linear mesaurements. When the measurement matrix A has entries i.i.d. Gaussian (henceforth, we assume this to 
be the case without further reference), and the estimation performance is measured in a mean-squared-error sense, 
we are able to precisely predict the asymptotic behavior of the error. The derived expression accurately captures 
the role of the link function g, the particular structure of xo, the role of the regularizer /, and, the value of the 
regularizer parameter A. Further, it holds for all values of A, and for a wide class of functions / and g. 

Interestingly, our result shows in a very precise manner that in large dimensions, modulo the information about 
the magnitude of xo, the LASSO treats non-linear measurements exactly as if they were scaled and noisy linear 
measurements with scaling factor p and noise variance a 2 defined as 

d := E[ 7 s( 7 )], and a 2 := £[( 3 ( 7 ) - py)\ for 7 ~ A/"(0,1), (3) 

where the expecation is with respect to both 7 and g. In particular, when g is such that p f 0 2 , then, 

the estimation performance of the Generalized LASSO with measurements of the form y, = p*(ajfxo) is 
asymptotically the same as if the measurements were rather of the form y t = //a/xo + oz lt with p . er 2 as in (3) 

and Zi standard gaussian noise. 

Recent analysis of the squared-error of the LASSO, when used to recover structured signals from noisy linear 
observations, provides us with either precise predictions (e.g. [TPGH15], [BM12]), or in other cases, with tight 
upper bounds (e.g. [OTH13], [DMM1 1]). Owing to the established relation between non-linear and (corresponding) 
linear measurements, such results also characterize the performance of the LASSO in the presence of nonlinearities. 
We remark that some of the error formulae derived here in the general context of non-linear measurements, have 
not been previously known even under the prism of linear measurements. 

Figure 1 serves as an illustration; the error with non-linear measurements matches well with the error of the 
corresponding linear ones and both are accurately predicted by our analytic expression. 

Under the generic model in (1), which allows for g to even be unspecified, xo can, in principle, be estimated 
only up to a constant of proportionality [Bri82], [LD89], [PV15]. For example, if g is uknown then any information 
about the norm ||xo|| 2 could be absorbed in the definition of g. The same is true when g{x) = sign(.x), eventhough 

2 This excludes for example link functions g that are even, but also see [GP + 13, Sec. 2.2] 
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Fig. 1: Squared error of the l \-regularized LASSO with non-linear measurements (□) and with corresponding linear ones (*) as a function 
of the regularizer parameter A; both compared to the asymptotic prediction. Here, gi(x) = sign(i + 0.32i) with z, L ~ JV( 0,1). The unknown 
signal xo is of dimension n=768 and has [0.15n] non-zero entries (see Sec. II-C1 for details). The different curves correspond to |"0.75n] 
and [1.2n] number of measurements, respectively. Simulation points are averages over 20 problem realizations. 


g might be known here. In these cases, what becomes important is the direction of xo- Motivated by this, and, in 
order to simplify the presentation, we have assumed throughout that x (l has unit Euclidean norm 3 , i.e. 11 xq 11 2 = 1- 


C. Discussion of Relevant Literature 

Extending an Old Result. Brillinger [Bri82] identified the asymptotic behavior of the estimation error of the 
LS solution xls = (A 1 A)~ 1 A I y by showing that, when n (the dimension of xq) is fixed, 


lim \frn\\k LS - /ux 0 || 2 = a, (4) 

m—> 00 


where g and o 1 are same as in (3). Our result can be viewed as a generalization of the above in several directions. 
First, we extend (4) to the regime where m/n = 5 S (l,oo) and both grow large by showing that 

o 


lim ||x LS 

n—¥ 00 


MX 0 ||2 = 


Vd^l' 


( 5 ) 


Second, and most importantly, we consider solving the Generalized LASSO instead, to which LS is only a very 
special case. This allows versions of (5) where the error is finite even when 5 < 1 (e.g., see (8)). Note the 
additional challenges faced when considering the LASSO: i) x no longer has a closed-form expression, ii) the 
result needs to additionally capture the role of xq, /, and, A. 


Motivated by Recent Work. Plan and Vershynin consider a constrained Generalized LASSO: 


xc-lasso = argmin ||y - Ax|| 2 , 
xe£ 


( 6 ) 


Tn [PV15, Remark 1.8], they note that their results can be easily generalized to the case when ||xoH 2 f 1 by simply redifining 
g(x) = gr( || xq 11 2 at) and accordingly adjusting the values of the parameters g and a 2 in (3). The very same argument is also true in our 


case. 
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with y as in (1) and 1C C W 1 some known set (not necessarily convex). In its simplest form, their result shows 
that when m > 7A:(p x n j then with high probability, 


| x C-LASSO “ F x o||2 


oVAcO x o) + C 


( 7 ) 


Here, /7/c (fixn) is the Gaussian width, a specific measure of complexity of the constrained set 1C when viewed 
from //X(). For our purposes, it suffices to remark that if /C is properly chosen, and, if //xo is on the boundary 
of 1C, then I)jc(jjJto) is less than n. Thus, estimation is in principle is possible with m < n measurements. The 
parameters p and a that appear in (7) are the same as in (3) and £ := £[( 5 ( 7 ) — p 7 ) 2 7 2 ]. Observe that, in contrast 
to (4) and to the setting of this paper, the result in (7) is non-asymptotic. Also, it suggests the critical role played 
by /x and a. On the other hand, (7) is only an upper bound on the error, and also, it suffers from unknown absolute 
proportionality constants (hidden in <). 

Moving the analysis into an asymptotic setting, our work expands upon the result of [PV15]. First, we consider 
the regularized LASSO instead, which is more commonly used in practice. Most importantly, we improve the 
loose upper bounds into precise expressions. In turn, this proves in an exact manner the role played by p and a 2 to 
which (7) is only indicative. For a direct comparison with (7) we mention the following result which follows from 
our analysis (we omit the proof for brevity). Assume /C is convex, m/n = 5 G (0, 00 ), D/c(px-o)/n = p G (0,1] 
and n —>• 00 . Also, 5 > p. Then, (7) yields an upper bound Ca\Jp/5 to the error, for some constant C > 0. 
Instead, we show 


x C-LASSO “ FXolb < <7 


Vp 

V$- p 


( 8 ) 


Precise Analysis of the LASSO With Linear Measurements. The first precise formula predicting the 
limiting behavior of the LASSO reconstruction error wer established in [DMM1 1], [BM12], [Stol3]. The authors of 
[DMM11], [BM12] consider the f'l-LASSO with l \-regularization and the analysis is based on the the Approximate 
Message Passing (AMP) framework [DMM09]; also [TMSB13], [MAYB13] for extensions. A more general line 
of work, [Sto 1 3], [OTH13], [TPGH15], [TPH15] studies the problem using a recently developed framework that 
is based on Gordon’s Gaussian min-max Theorem (GMT) [Stol3], [TOH]. The GMT framework was initially 
used by Stojnic [Sto 1 3] to derive tight upper bounds on the constrained LASSO with (]-regularization; [OTH13] 
generalized those to general convex regularizers and also to the ( 2 -LASSO; the case of the ( 9 -LASSO was studied 
in [TPH15]. Those bounds hold for all values of SNR, but they become tight only in the high-SNR regime. A 
precise error expression was derived in [TPGH15] for the ( 2 -LASSO with ()-regularization under a gaussianity 
assumption on the distribution of the non-zero entries of xo. When measurements are linear, our Theorem 2.3 
generalizes this assumption; in its current form, it provides the first-known counterpart of the main result of 
[BM12] for the (' 2 -LASSO. Our main Theorem 2.2 provides error predictions for regularizers going beyond the 
G-norm, e.g. G, 2 -norm, nuclear norm, which appear to be novel.When it comes to non-linear measurements, to 
the best of our knowledge, this paper is the first to derive asymptotically precise results on the performance of 
any LASSO-type algorithm. 


II. Results 


A. Modeling Assumptions 

Unknown structured signal. We let xo G 1R." represent the unknown signal vector. We assume that 

x 0 = x o/H x o||2, 







6 


with xo sampled from a probability density p^ 0 in R n . Thus, xo is deterministically of unit Euclidean-norm (this 
is mostly to simplify the presentation, see Footnote 4). Information about the structure of xo (and correspondingly 
of xo) is encoded in p^ 0 . For instance, to study an x 0 which is sparse, it is typical to assume that its entries are 
i.i.d. xo,?; ~ (1 — f>)(\\ + pq y o , where p G ( 0 , 1 ) becomes the normalized sparsity level, q-^ o is a scalar p.d.f. and 
do is the Dirac delta function 4 . 

Regularizer. We consider convex regularizes / : R n —» R. 

Measurement matrix. The entries of A G M mx " are i.i.d. jV(0,1). 

Measurements and Link-function. We observe y = g(Axo) where g is a (possibly random) map from R m to 
R m and g(u) = [gi(u\),. .., g m (u m )] T . Each gt is i.i.d. from a real valued random function g for which p and 
c7 2 are defined in (3). We assume that p and rr~ are nonzero and bounded. 

Asymptotics. We study a linear asymptotic regime. In particular, we consider a sequence of problem instances 
{xq‘\a( n ),/( n ), m( n )} ne N indexed by n such that A€ R mxn has entries i.i.d. A/"(0,1), /: R n —>• R is 

proper convex, and, m := rriM'l with rn = 5n,6 G (0, oo). We further require that the following conditions hold: 

_ ( n ) ( n \ 

(a) Xq ’ is sampled from a probability density p- ' in R” with one-dimensional marginals that are independent 
of n and have bounded second moments. Furthermore, n~ l ||xg n 'Hi —> a 2 = 1. 

(b) For any n € N and any ||x ||2 < C, it holds n _ 1 / 2 /(x) < c\ and n -1 / 2 max sG ay(n)( x ) ||s11 2 < C 2 , for constants 
ci, C 2 , C > 0 independent of n. 

In (a), we used “-A-” to denote convergence in probability as n —> 00 . The assumption rr 2 = 1 holds without 
loss of generality, and, is only necessary to simplify the presentation. In (b), <9/(x) denotes the subdifferential of 
/ at x. The condition itself is no more than a normalization condition on /. 

Every such sequence {xg n \ A^ n \ f^} n & generates a sequence {x^,y 1 ^where x| ) " ) := Xg^/||xo^H 2 
and y {n> := (f ri ^( Axo). @hen clear from the context, we drop the superscript (n). 

B. General Result 

Let {xfU^./^J^Inen be a sequence of problem instances that satisfies the conditions of Section II-A. 
With these, define the sequence {x( n )} ng N of solutions to the corresponding LASSO problems for fixed A > 0: 

x( n ) := m m-j-(||y^> - AWx | 2 + A/ (rl) (x)| . (9) 

X yjn 1 > 

The main contribution of this paper is a precise evaluation of lim, woo |/F 'x-"^ — Xq”^ 11 \ with high probability 
over the randomness of A, of xo, and of g. To state the result in a general framework, we require a further 
assumption on and f( n \ Later in this section we illustrate how this assumption can be naturally met. We 
write f* for the Fenchel’s conjugate of /, i.e., /*(v) := sup x x 7 v — /(x); also, we call the proximal function of 
/ to be prox /T (v) := min x {i||v - x||| +r/(x)}. 

Assumption 1: We say Assumption 1 holds if for all non-negative constants ci, 02,03 € R the point-wise limit 
of ^prax^^.yn) C3 (cih + C 2 X 0 ) exists with probability one over h ~ W(0, I n ) and xo ~ • Then, we denote 

the limiting value as F(ci, 02 ,C 3 ). 

4 Such models in place for studying structured signals have been widely used in the relevant literature, e.g. [DJ94], [DMM11], [DJM13]. 
In fact, the results here continue to hold as long as the marginal distribution of xo converges to a given distribution (as in [MAYB13], 
[BM12]). 
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Theorem 2.1 (Non-linear=Linear): Consider the asymptotic setup of Section II-A and let Assumption 1 hold. 
Recall // and o 2 as in (3) and let x be the minimizer of the Generalized LASSO in (9) for fixed A > 0 and for 
measurements given by (1). Further let x lm be the solution to the Generalized LASSO when used with linear 
measurements of the form y lin = A(/./,xo) + oz, where z has entries i.i.d. standard normal. Then, in the limit of 
n —>• oo, with probability one, 

||x - /ixolll = ||x lm - Atxolll- 

Theorem 2.1 relates in a very precise manner the error of the Generalized LASSO under non-linear measurements 
to the error of the same algorithm when used under appropriately scaled noisy linear measurements. Theorem 2.2 
below, derives an asymptotically exact expression for the error. 


Theorem 2.2 (Precise Error Formula): Under the same assumptions of Theorem 2.1 and 6 := m/n, it holds, 
with probability one, 

lim ||x - /ix 0 ||l = a 2 , 

n—>■ oo 

where ct* is the unique optimal solution to the convex program 

ar p 2 r aA 2 f /3 pr r 


max min a 2 + <r 2 -h 

0</3<i a >o 2 2 a 

T> o 


-F 


A A a A a / 

Also, the optimal cost of the LASSO in (9) converges to the optimal cost of the program in (10). 


( 10 ) 


Under the stated conditions, Theorem 2.2 proves that the limit of ||x — //xo ||-2 exists and is equal to the unique 
solution of the optimization program in (10). Notice that this is a deterministic and convex optimization, which 
only involves three scalar optimization variables. Thus, the optimal a* can, in principle, be efficiently numerically 
computed. In many specific cases of interest, with some extra effort, it is possible to yield simpler expressions for 
ct*, e.g. see Theorem 2.3 below. The role of the normalized number of measurement S = m/n, of the regularizer 
parameter A, and, that of g, through p and a 2 , are explicit in (10); the structure of xo and the choice of the 
regularizer / are implicit in F. Figures 1-2 illustrate the accuracy of the prediction of the theorem in a number of 
different settings. The proofs of both the Theorems are deferred to Appendix A. In the next sections, we specialize 
Theorem 2.2 to the cases of sparse, group-sparse and low-rank signal recovery. 


C. Examples 

1) Sparse Recovery: Assume each entry xq ,i,i = 1,... , n is sampled i.i.d. from a distribution 


Px 0 ( x ) = (! - P) ' <*o(®) + P- <tx 0 ( x )> 


( 11 ) 


where <5o is the delta Dirac function, p € (0,1) and q^ Q a probability density function with second moment 
normalized to 1/p so that condition (a) of Section II-A is satisfied. Then, xo = xo/||xo ||2 is pn-sparsc on average 
and has unit Euclidean norm. Letting /(x) = ||x||i also satisfies condition (b). Let us now check Assumption 1. 
The Fenchel’s conjugate of the /: i -norm is simply the indicator function of the Coo unit ball. Hence, without much 
effort, 


1 

-P rOX VH(/*)Mc3 


(cih + c 2 x 0 ) 


1 n 

— V' min (Vi - (cihj + c 2 x 0 .i )) 2 
In ' \vi <i 
i =1 

1 ” 

-x- r / 2 ( c 'lb + c 2 x 0 , ? ;; 1), 


i =1 


( 12 ) 




Sparse signal recovery 


Group-sparse signal recovery 




Fig. 2: Squared error of the LASSO as a function of the regularizer parameter compared to the asymptotic predictions. 
Simulation points represent averages over 20 realizations, (a) Illustration of Thm. 2.3 for g(x ) = sign(at), n = 512, pj^ a (+1) = 
Py o (+l) = 0.05, (+1) = anc * two va l ues of S, namely 0.75 and 1.2. (b) Illustration of Thm. 2.2 for xo being group- 

sparse as in Section II-C2 and gi(x) = sign(x + 0.3 zi). In particular, xo is composed of t = 512 blocks of block size b = 3. 
Each block is zero with probability 0.95, otherwise its entries are i.i.d. J\f( 0,1). Finally, 5 = 0.75. 


where we have denoted 


r/ 0 ; r) := (x/\x\) (|rc| — r) 


(13) 


for the soft thresholding operator. An application of the weak law of large numbers to see that the limit of the 
expression in (12) equals F(ci,C 2 ,cs) := |E [t/ 2 (ci/i + 02 ^ 0 ; 1)] , where the expectation is over h AA( 0,1) 
and Xq ~ p-Y o - With all these, Theorem 2.2 is applicable. We have put extra effort in order to obtain the following 
equivalent but more insightful characterization of the error, as stated below and proved in the Appendix. 

Theorem 2.3 (Sparse Recovery): If 5 > 1, then define A cr it = 0. Otherwise, let A cnt , K cnt be the unique pair of 
solutions to the following set of equations: 

j k 2 5 = (T 2 + E [(r)(nh + pX 0 ] kX) - pX 0 ) 2 ] , (14) 

\ kS = E[(t/(k/i + pX 0 ; kX) ■ h)], (15) 

where h ~ Af( 0,1) and is independent of Xq ~ Then, for any A > 0, with probability one, 


lim ||x — /ixolll = 

n^-oo 


5n 2 rit - a 2 
5 kI(X) - a 2 


j — A cr it, 

j — A cr it, 


where k 2 (X) is the unique solution to (14). 

Figures 1 and 2(a) validate the prediction of the theorem, for different signal distributions, namely q-^ Q being 
Gaussian and Bernoulli, respectively. For the case of compressed (<5 < 1) measurements, observe the two different 
regimes of operation, one for A < A cr it and the other for A > A cn t, precisely as they are predicted by the theorem 
(see also [OTF113, Sec. 8 ]). The special case of Theorem 2.3 for which q-^ Q is Gaussian has been previously 
studied in [TPGH15]. Otherwise, to the best of our knowledge, this is the first precise analysis result for the 
( 2 -LASSO stated in that generality. Analogous result, but via different analysis tools, has only been known for 
the (|-LASSO as appears in [BM12], 
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2) Group-Sparse Recovery: Let xo € M n be composed of t non-overlapping blocks of constant size b each 

such that n = t ■ b. Each block [xo]j,i = 1,... ,t is sampled i.i.d. from a probability density in M. b : (x) = 

(1 — p) ■ <5o(x) + p ■ <£y q (x),x G R 6 , where p G (0,1). Thus, xo is a pf-block-sparse on average. We operate 
in the regime of linear measurements m/n = S € (0,oo). As is common we use the fn^-norm to induce 
block-sparsity, i.e., /(x) = Yll=i || [^o]i II 2 ; with this, (9) is often referred to as group-LASSO in the literature 
[YL06]. It is not hard to show that Assumption 1 holds with F{c\, C 2 , C 3 ) := ^E [||r/(cih + c 2 Xo; 1)|||] , where 
? 7 (x; r) = x/||x|| ( 11 x 112 — r) + , x € R b is the vector soft thresholding operator and h ■A/"(0,1 b), Xo ~ Px 0 
and are independent. Thus Theorem 2.2 is applicable in this setting; Figure 2(b) illustrates the accuracy of the 
prediction. 

3) Low-rank Matrix Recovery: Let Xo € M. dxd be an unknown matrix of rank r, in which case, xo = vec(Xo) 
with n = d 2 . Assume m/d 2 = 5 € (0, 00 ) and r/d = p € (0,1). As usual in this setting, we consider nuclear-norm 
regularization; in particular, we choose /(x) = \/d||X||*. Each subgradient S G <9/(X) then satisfies ||S||i? < d 
in agreement with assumption (b) of Section II-A. Furthermore, for this choice of regularizer, we have 

-P r ox^ (/ . )( n )jC3 (ciH + c 2 X 0 ) = 77^2 min || V - (ciH + c 2 X 0 )|||- 
n za ||v|| 2 <vd 

= min ^ || V - d~ 1/2 (c iH + c 2 X 0 )||p = ^ ^2 7,2 ( Si ( d_ 1 / 2 ( c i H + C 2 X 0 )) 5 1) , 

— i =l 

where r;(-; •) is as in (13), Si(-) denotes the i th singular value of its argument and H G M. dxd has entries Af(0, 1). 
If conditions are met such that the empirical distribution of the singular values of (the sequence of random 
matrices) ciH + c 2 Xo converges asymptotically to a limiting distribution, say q(c\. c 2 ). then F[c\, c 2 , c. 3 ) := 
x ~ ? ( Cl ,c 2 ) \ r i 2 { x \ 1)] j ar *d Theorem 2.1-2.2 apply. For instance, this will be the case if d~ 1 // 2 X 0 = USV f , where 
U, V unitary matrices and S is a diagonal matrix whose entries have a given marginal distribution with bounded 
moments (in particular, independent of d). We leave the details and the problem of (numerically) evaluating F 
for future work. 

D. An Application to q-bit Compressive Sensing 

1) Setup: Consider recovering a sparse unknown signal xo G M" from scalar q-bit quantized linear measure¬ 
ments. Let t := {to = 0, fi,..., tL-i, tL = + 00 } represent a (symmetric with respect to 0) set of decision 
thresholds and t := {±( 1 . ±f 2: .... +(7,} the corresponding representation points, such that L = 2 <? ~ 1 . Then, 
quantization of a real number x into q-bits can be represented as 

L 

Q q (x,£,t) = sign(a:)^^l{ t4 _ 1 <| a! |< t .}, 

i= 1 

where I 5 is the indicator function of a set S. For example, 1-bit quantization with level l corresponds to Qi(x, £) = 
£ ■ sign (a:). The measurement vector y = y \. y 2 ..., y m ] 7 takes the form 

Vi = Q q { afx 0 ,£,t), i = l,2,...,m, (16) 

where aps are the rows of a measurement matrix A G M mxn , which is henceforth assumed i.i.d. standard 
Gaussian. We use the LASSO to obtain an estimate x of xo as 

x := argmin ||y — Ax || 2 + A||x|| 1 . 

X 


( 17 ) 
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Henceforth, we assume for simplicity that 11xo 11 2 = 1- Also, in our case, g is known since g = Q q is known; thus, 
is reasonable to scale the solution of (17) as g~ 2 ic and consider the error quantity ||ji _1 x — X 0 II 2 as a measure of 
estimation performance. Clearly, the error depends (besides others) on the number of bits q, on the choice of the 
decision thresholds t and on the quantization levels £. An interesting question of practical importance becomes 
how to optimally choose these to achieve less error. As a running example for this section, we seek optimal 
quantization thresholds and corresponding levels 


(t*,4) = argmin||/i ^-xq^, 


(18) 


while keeping all other parameters such as the number of bits q and of measurements m fixed. 

2) Consequences of Precise Error Prediction: Theorem 2.1 shows that ||q -1 x — X 0 II 2 = ||x hn — X 0 II 2 , where 
x lin is the solution to (17), but only, this time with a measurement vector y lin = Axo + ^z, where g. o as in (20) 
and z has entries i.i.d. standard normal. Thus, lower values of the ration o 2 /g 2 correspond to lower values of the 
error and the design problem posed in (18) is equivalent to the following simplified one: 

cr 2 (t, £) 


(t.,£.) = axgmm (i2(M) . 

To be explicit, g and a 2 above can be easily expressed from (3) after setting g = Q q as follows: 

L 


T :=g(£,t) = 


7=1 


e - t :/2 _ e - t ?/2 


and 


tr 


: = (J 2 (£, t) := t 2 — g 2 


(19) 


( 20 ) 


where, r 2 := t 2 (£, t) = 2 S ^£ 2 ■ (Q(U- 1 ) - Q(U)) and Q(x) = 


i— 1 


V2n ■ 


exp(— u 2 /2)du. 


3) An Algorithm for Finding Optimal Quantization Levels and Thresholds: In contrast to the initial problem 
in (18), the optimization involved in (19) is explicit in terms of the variables £ and t, but, is still hard to solve in 
general. Interestingly, we show in the Appendix . that the popular Lloyd-Max (LM) algorithm can be an effective 
algorithm for solving (19), since the values to which it converges are stationary points of the objective in (19). 
Note that this is not a directly obvious result since the classical objective of the LM algorithm is minimizing the 
quantity E[||y — Axq|||] rather than E[||/i _1 x — x 0 1||]- 
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A. Theorem 2.2 


We start with the proof of Theorem 2.2. Theorem 2.1 will follow as a direct corollary of this result. 

Assume a sequence of problem instances as described in Section II-A. To keep notation simple, we simply use 
||v|| (rather than ||v|| 2 ) for the Euclidean norm of v and we shall also drop the superscript (n) when referring to 
elements of the sequence. Thus, we write 


x = argrnin —i=||p(Ax 0 ) - Ax|| + -^=/(x), 
x y/n v n 


( 21 ) 


but it is to be understood that the above actually produces a sequence of solutions x( n ) indexed by n. Our goal 
is to characterize the nontrivial limiting behavior of ||x — //Xf,|| 2 • 

We start with a simple but useful change of variables w := x — px 0 , to directly have a handle on the error 
vector w. Then, (21) becomes: 

w := argmin —^=||p(Ax 0 ) - pAx 0 - Aw|| + -^=f(px 0 + w) 

w y/n y/n 

= arg min max —^=(—u T Aw + u 7 (^(Ax 0 ) - pAx 0 )) + -^=f(px 0 + w), (22) 

w ||u||<i y/n y/n 


where the second line follows after using the fact ||v|| = maxi| u || 2<1 u 7 v. 

1) A Key Decomposition: The first key step in the proof is a trick adapted from the proofs of [PV15, Lem. 4.3] 
and [PVY14, Thm. 1.3]. Until further notice, we condition on xo- Also, we repeatedly make use of the assumption 
that 11 xo 11 = 1 without direct reference. The trick amounts to decomposing each measurement vector a, in its 
projection on the direction of xo and its orthogonal complement. Denoting P = (I — xoXq ) for the projector 
onto the orthogonal complement of the span of xo (recall 11xo 11 2 = 1), we have a J = (a^xo)x^ + a 7 P , or, in 
matrix form: 

A = (Ax 0 )xq + AP 7- . 


Then, (22) becomes: 

min max —= — u 7 AP x w + u T (<?(Axo) — pAxo — (Axo)x 7 w) 4——/(ax ,0 + w). (23) 

w ||u||<i y/n y/n 

Now, we use the Gaussianity assumption on the entries of A to see that AP is independent of Axo- It can then 
be shown (see [PV15, pg. 13]) that AP^ is also independent of (<f(Axo) — pAxo)\ thus, AP 7 w is independent 
of the rest terms in in (23). This shows that the objective function of (23) is distributed identically even after 
replacing the AP x w with GP w. where G is an independent copy of A. After all these, (23) is identically 
distributed with the following: 

min max -^{-u 7 GP 7 w+ u T (z e - (xq w)e)} + ^=/(/rx 0 + w). (24) 

w ||u||<l y/n y/n 

where G and e := Axo have entries i.i.d. standard normal and are independent of each other. Also, z e := g(e)—pe 
for convenience. 
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2) Applying the cGMT: After the decomposition step in the previous section, we have transformed the initial 
problem to that of analyzing the (probabilistically) equivalent one in (24). In particular, we wish to evaluate the 
limiting behavior of ||w|| 2 , i.e. the norm of the minimizer of the optimization in (24). The analysis is possible 
thanks to the convex Gaussian Min-max Theorem (cGMT) [TOH, Thm. 1], which is a stronger version of the 
classical result of Gordon [Gor85] in the presence of additional convexity assumptions. According to the cGMT, 
the analysis of a Primary Optimization (PO) problem that is of the form 

min max u T Gv + ^(v, u), (25) 

v£<Sv Ug<S u 

with G being i.i.d. Gaussian, 5 V ,5 U convex, compact sets and a convex-concave function, can be carried out 
via analyzing a corresponding Auxiliary Optimization problem (AO), which is defined as 

min max ||v||g 7 u 4- ||u||h T v + i/i(v,u). (26) 

V&S„ UScSu 

In (26), g and h are i.i.d. standard Gaussian vectors of appropriate size. To apply the theorem, identify v := P w 
in (24) and the appearance of the bilinear term u T Gv as in (25). Also, the rest of the objective function in (24) 
is convex in P w (where we have used the convexity of /) and linear (thus, concave) in u. Overall, (24) is in 
the appropriate format of a (PO) problem as in (25). The only technical caveat is that the minimization over w 
in it appears unconstrained. For this, we assume that the minimizer of (24) satisfies ||w|| < A w for sufficiently 
large constant A w > 0 independent of n. If our assumption is valid, then by the end of the proof we will have 
identified a quantity a* > 0 to which ||w|| converges; If a* turns out to be independent of the choice of A w . then 
we may explicitly choose A w = 2a* (say) and a* is the true limit; on the other hand, if a* turns out to depend 
on K w , this means that we could have chosen A w arbitrarily large in the first place, and so the true limit diverges. 
Thus, assuming that ||w|| the minimization in (24) is not affected by imposing the constraint ||w|| < A w . With 
these, we can write the corresponding (AO) problem as 

w = arg min max —^{HP^wllg 2 u — ||u||h T P _L w + u r (z e — (xlw)e)} -|—^=/(/rxo + w). (27) 

Hwll^Xw ||u||<i \Jn y n 

We will see that analyzing this problem is simpler than the (PO) (and certainly so of the one we started with in 

( 22 )). 

3) Analysis of the Auxiliary Optimization: The goal of this section is analyzing the (AO) problem in (27). In 
particular, we will prove i) the optimal cost of the (AO) problem converges to the optimal cost of the deterministic 
optimization in (10), which involves three scalar optimization variables a, /3, r, ii) the max-min problem in (10) is 
strongly convex in a and jointly concave in /?, r, iii) ||w|| converges to the unique optima a* in (10). With these, 
the claim of the Theorem follows by [TOH, Thm. 1] (also, see [TOH14, Cor. A.l]), as previously discussed. 

The analysis requires several steps. The randomness in (27) is over e, g, h, xo and possibly the link function 
g: at each step we condition on all but a subset of these and identify convergence of the objective function of the 
(AO) with respect to the remaining. Pointwise convergence (with respect to the involved optimization variables) 
needs to be turned into uniform convergence to guarantee that not only the objective function, but also the min/max 
value and the optimizer converge appropriately. (Strong) convexity of the objective will turn out to be crucial for 
this. 

Introducing the Frenchel conjugate. To begin with, let us rewrite the (AO) problem above by expressing / 
in terms of its Frenchel conjugate, i.e. 

/(x) = supv T x — /*(v) = sup \/nv r x — /*(y / nv). 

V V 


( 28 ) 
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Translating to our problem and after rescaling this gives, 

n~ 1/2 f(gx 0 + w) = supv T Ox 0 + w) - n -1 / 2 /*(\/nv). (29) 

V 

Now, from standard optimality conditions of (28), the optimal v* satisfies v* € <9/(x). Then, using condition 
(b) of Section II-A, ||v*|| = O (y/n) for all x such that ||x|| = O (1). From this, and ||w + /rxo|| = O (1) we 
conclude that the optimal v* in (29) satisfies ||v*|| < K v < 0 for sufficiently large constant K v independent of 
n. Also, Putting everything together, (27) is equivalent to 

min max -^u T (z e — (xn w)e—llP^wllg) — ||u||h T P~ L w 
||w||<K w ||u||<l y/Tl 

0<|| V ||<i^ v 

+ A v r (/ix 0 + w) - A f* (v), (30) 


where we have also denoted h := n */ 2 h and /*(v) = n - 1 / 2 f* (yTi V ). Observe again that by condition (b) of 
Section II-A, /*(v) = max x x 7 v — n _ 1 // 2 /(x) = O (1) since v = O (1). 

In order to somewhat simplify the exposition, we often omit explicitly carrying over the constraints ||w|| < /\ w , 
||v|| < K v until the very last step, but we often recall and actually make use of it. 

Optimizing over the direction of u. Observe that maximization over the direction of u is easy in (30), which 
then becomes: 


min max —>=/5|| z e — (xj w)e—||P" L w||g I — h 7 P^w + Av T (/rxo + w) — A/*(v). (31) 

w 0</3<i \Jn 

V 


Now, observe that the objective function above is convex in w and jointly concave in f3, v (recall f* is convex). 
Furthermore, the constraint sets are convex and compact. Flence, we can flip the order of min-max as in [Roc97, 
Cor. 37.3.2] : 

max min -^=11 z e + (xjw)e — llP^wllg II — h 7 P^w + Av 7 (/ixq + w) — A/*(v) 

0</3<i w yjn 

V 

= max min —— 1| z e + aoe — oig II — max {/3h 7 P^w — Av T (/zxn + w) + A. 

0</3<l ai,a 2 >0 y/n" HP-LwI^ai l J 

v xjf w=a 2 


By decomposing w as P w + (xq w)xo, it is not hard to perform the maximization over w to equivalently write 
the last display above as: 

max min -^=11 z e + 02 e — oig II — aill/^P^h — AP^vll + A/rv T xo + Q 7 A(v T xo) — A/*(v). (32) 

0</3<l a i,a 2 >0 y/n 


V 


The randomness of e, g and g. Until further notice condition on h and xo- All randomness in (32) is now 
on the first term. 

Consider /), v fixed for now. For any pair 07, 07 by the WLLN, m ||z e +02e— oig]] 2 —> ^[((7(7) —747+077— 
ai7') 2 ], where 7,7' ~ Af(0, 1 ) and independent. Recall, £[(<7(7) — /ry) 2 ] = cr 2 , £[(<7(7) — 747)7] = 74 — 74 = 0 
and m/n = 5, to conclude that n” 1 / 2 ||z e + 02 e — oig|| —> y/dy/a 2 + af + a 2 , where convergence is point-wise 
in 07,07. The objective function in (32) is jointly convex in [07,02]- Lastly, the function yja 2 + a 2 + a 2 can be 
shown (by direct differentiation) to be jointly strongly convex over [07,07]. With these, we use [NM 94 , Thm. 2.7] 
to conclude that (for any ( 5 , v) i) the minimum over 07,07 in (32) converges to 


min j3\f5\ cr 2 + a? + a% — 07 ||/3P" L h — AP^vH + A/uv T xo + 02 A(v T xo) — A/*(v), 
q 1i q 2>0 V 


( 33 ) 
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and, ii) the optimal a\,ot 2 of (32) converge to the unique (by strong convexity) optimal of (33). 

Up to now, (3, v were assumed fixed and the convergence from (32) to (33) holds point-wise with respect to 
/3, v. The objective function in (32) is jointly concave with respect to /?, v. Thus, (32) converges to 

max min f3V6\/a 2 + a? + ak — ai||/3P J ~h — AP^vH + A/iv r xo + a 2 ^(v T xo) — A/*(v), (34) 

0</3<l ai,O!2>0 V 

v 

and the optimal a \, av of the former converge to the corresponding optima of the latter. 

Merging a\ and 02 . It is important to note that af + a| in (34) correspond exactly to the squared norm of 
the error. Here, we simplify (34) by introducing the quantity a 2 + a 2 as the minimization variable rather than 
sperately a 1 and a- 2 - By first order optimality conditions in (34) we find 

aiPVd = H/SP^h — AP^vH^/ a 2 + a 2 + cr 2 and a 2 — fiVd = Av 7 xo \J a 2 + a 2 + a 2 . (35) 

Substituting this in (34), the objective becomes (ignoring the terms that do not involve a\ or 02 ): 

/ SVSy/a 2 +a 2 +a 2 - + ^ (llffP^h - AP^f + (Av r x 0 ) 2 ) 

But, from (35) we find \Jo 2 + af -f a\^/\\j3 P-*-h — AP-*-v|| 2 + (Av r xo) 2 = /3V6y/a 2 + a|. Combining, we 
conclude that (34) can be written as 

max min pVd\/a 2 + a 2 — a||/3P _L h — Avll + Apv 7 xo — A/*(v), (36) 

0</3<i «>o " v ' 

V 

where the new optimization variable a plays the role of \Jot\ + a\, thus it represents the norm of the error vector 
||w||. We have also identified H/TP^h — Av|| 2 + (Av 7 xo) 2 = H/SP^h — Av|| 2 
Introducing a new optimization variable. To get a better handle at it, we square the norm term in (36) at the 
expense of introducing a new scalar optimization variable. This is based on the following trick: 

Vx = min T - + (37) 

t>o z It 

for any x > 0. Thus, (36) becomes 

max min j3y/5\/a 2 + a 2 — — —^||/3P ± h — Av|| 2 + A/iv T xo — A/*(v), (38) 

0</3<l a >0 2 2T 

v,r>0 

where we have also flipped the order of min-max between a and r. We could do this as in [Roc97, Cor. 37.3.2] 
since the objective is convex in a and concave in r, the constraint sets are both convex and both of them are 
bounded. To argue the boundedness, recall that a < A w ; for t it suffices to combine optimality conditions of 
(37) and boundedness of v, ||v ||2 < K v . 

Optimizing over v. Note that the objective in (38) is concave in v, convex in a and the constraint sets are 
convex compact. Thus, as it might be expected by now, we use [Roc97, Cor. 37.3.2] to flip the corresponding 
order of max-min. Also, after some simple algebra while using P x () = 0 and ||xo|| = 1, it can be shown that 

2 

H/SP^h- Av|| 2 - 2-A/xv i x 0 = ||Av - (/3'P- L h + —/zx 0 )|| 2 - /i 2 ^. 

a a a~ 
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Combining, we conclude with 


(38) = max min j3y/6\/a 2 + a 2 — - 77 - + p 2 


0</3<l «>0 ' 

r>0 


= max mmG(a,^,r) 
0</3<l ct>0 
r>0 


r 

2 ct 


a\ 2 

-min 

r v 


1 n B 1 - r 
_|| V _ ( _ P h+^xo) 


l 2 + ^"*W 


(39) 


Here, G(a, (3, r) is convex in a (see (36)) and jointly concave in over j3, r. To see the latter it suffices to show 
that ||v — JjP-’-h + ^-xo)|| 2 is jointly convex over /3,r, v (minimization over v does not change the joint 
convexity over t and /?.). Norm is separable over its entries, so we equivalently show that for scalars r, B, v, the 
function \ (v— ci/3 — C 2 T ) 2 is jointly convex over r > 0, /3; this is true as the perspective function of (v — c\f3— C 2 ) 2 . 

The randomness of h and xo- For now, fix any /?, r and let d := d(/3, r) be the minimizer in (39). 

First, we prove that d(/3, t) > 0. For any a > 0, by choosing v = min{^, |j^j-}xo =: #x 0 where 0 < K< 
K v such that v € dom/*, we find 


mm 


aA 2 .. ,/3 1 - t . 

_|| V _ ( _ P h + -„x l) )|| 


< 


A 

2 a 


1 - 


flaAY 

Tfl ) 


+ Xf*(y) } < ^rl|v-(^P ± h+-^-/rx 0 )|| 2 + A/*(v) 

+ ^/3 2 ||P ± h || 2 + A/*(0x o ). 

It 


(40) 


Thus, the value of the objective in (39) is lower bounded by 


pVsVcr 2 + a 2 - ? + Ii 2 t- 1 - 1 - 


2 a 


6aX 

T/l 


^HP^hf-Af^xo). 

Zr 


which goes to +00 as a —>• 0, since by definition 6 < Hence, a > 0, as desired. 


Now, fix B,r,a> 0, denote ci = j, c 2 = C 3 = ^ and consider 

i?(h,x 0 ) := R(a,P,p; h,x 0 ) := mini -||v - ciP^h - c 2 x 0 || 2 + c 3 /*(v) 

v 2 


a\' 

— 


Recall from Assumption 1 that 


A(h,x 0 ) := R(a,/3,p; h,x 0 ) := min ^||v - cih - c 2 p-^L || 2 + c 3 /*(v) 


9 „- r n ■ -o, v,, (41) 

2 \/n 

converges to F := F{c\, c 2 . c 3 ) in probability. Also, recall xo = xo||xo||. Next, we show that for all constant 

C >0 


|-R(h,x 0 ) - A(h,x 0 )| < C 


(42) 


with probability approaching one in the limit of n 00 . Combining this with Assumption 1, will prove that 
A(h,xo) converges in F in probability. 

Proof of (42): Fix any e > 0. We condition on the following events: 

|h T x 0 | < e, 

l-e<n _ 1 / 2 ||xo|| < 1 + e. 


( 43 ) 
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Each one of the events occurs with probability approaching one as n oo; the first follows since h Af(0,n 1//2 ) 
and ||xo|| = 1 and from standard tail bounds on Gaussians; the second is due to condition (b) of Section II-A. 
Without loss of generality assume 7?(h,xo) > A( h. xq), and let v* be optimal in (41), then 


|f?(h,x 0 ) - A(h,x 0 )| < ~||v* - ciP^h - c 2 x 0 || 2 - ^||v* - Cih - c 2 ^=|| 2 

2 2 y/n 


= c± (xq h)x 0 + c 2 x 0 


1 


1 


n ||x 0 | 
T, 


v* - Cih - ~c 2 x 0 


1 1 

+ n_ , 

n x 0 | 


+ -ci(xpr)x 0 


= - J c i( x o h ) 2 + c i(xq h)(x^v*) - cic 2 (xJh)A^ + c 2 (x^v*) 


n 


- 1 - -CS 


l x ol 


- 1 


n 


S —c 2 e J T ci||v*||e + cic 2 6(1 + e) + c 2 11v*||e + — C 2 e (2 + e) 


(44) 


where the last line follows after bounding the absolute values of the summands using (43). Recall now that 
||v*|| < K v < oo and also ci,c 2 ,C 3 are also bounded constants (independent of n). Then, for all > 0 in (42) 
we can find sufficiently small e > 0 such that the value of the last expression in the panel above is no larger than 
thus completing the proof of (42) 

Thus, we have shown that G(a , (3, r) in (39) converges pointwise to 


H(a, (3, t) \= vW cr 2 + a 2 -+ fi 2 - - 

2 2a 


a \ 2 f3 t Tfi 

t A’ a \'aA ’ 


in the limit of n —>• oo. Note that H is strongly convex in a and jointly concave in 3. v since taking limits does 
not affect convexity properties (recall that G is convex-concave). Also, we showed that o:*(h. xq) > 0 for the 
optimal in (39). With these, it follows as per [NM94, Thm. 2.7] that (i) 

p 


min max G(a, B, r) 

0</3<l,r>0 a>0 


min max H(a, B,t), 

0</3<1,t> 0 a>0 


(45) 


p 

and, (ii) a*(h, xo) —> a*, where a* the unique minimizer of the second optimization in (45). This completes the 
proof of the Theorem. 


B. Theorem 2.1 

The theorem is a direct consequence of Theorem 2.2. In particular, Theorem 2.2 proves that the value a* to 
which the error converges only depends on g through the parameters // and a 2 . Those are the same (by definition) 
for the non-linear and the linear case considered, thus the errors are the same. 


Appendix 

Proof of Theorem 2.3 


Specializing Theorem 2.2 to the setup of Section II-C1 we showed in the same section that ||x — /xxo|| converges 
in probability to the unique minimizer a* of the following max-min problem: 

,2 


max min77(a,/3,r) := a 2 + a 2 -1-IE 

0</3<i q>o 2 2a 2r 

r>0 


V 


h + A 


a 


(46) 


where the expectation is over h ~ Af(0,1) and Xq ~ pj7 o - Here, we prove Theorem 2.2 by analyzing the optimality 
conditions of (46). Recall as in the IEEEproof of Theorem 2.2 that H is jointly concave in /3,p and strongly 


convex m a. 












18 


A. First Order Optimality Conditions. 

We begin with a lemma, which characterizes the first-order optimality conditions of (46). 

Lemma A.l (Optimality Conditions): Consider the following pair of equations with respect to /3 and k: 

( j3 2 K 2 5 = a -2 + E [(r)((3nh + pX o; nX) — pX o) 2 ] , 

\ (3k6 = + pX o; kX) ■ h)]. 

Also, define A nim to be the unique non-negative solution to the equation 

(1 + x 2 ) f X e~ z2/2 dz - xe~ x2/2 = 6 
.7—00 

With these, let (/3*,r*,a:*) be optimal in (46). Then, 

a 2 = (3 2 n 2 5 — o 2 and k* = 


(47) 

(48) 


a 


y/ffi - T 2 


(49) 


such that, 

(i) If = 1 and A > A m ; n , then k* is the unique solution to (47) for /3 = 1, 

(ii) If (3 * e (0,1) ,then k*,/ 3* are solutions to the pair of equation (47)-(48). 


Proof: Let us compute -M-H(/3,a,T) and jj-H(J3, a, r). For convenience define 


P 


a 


dr 

rp 2 aA 2 
2a 2 r 


E 


X Xa 


Taking derivatives in (46) with respect to a and r and equating them with zero gives 


pVs. 


a 


V a 2 + <7 2 


-l-^P'(-) = 0 , 


a* 


a 


~ + - p \-) = o. 

2 a a 


(50a) 

(50b) 


Here, P' is the derivative of P(x) with respect to x. Any optimal /3*, r*, a* satisfies these. Then, it only takes 
multiplying (50b) by f and adding the result to (50a) to see that 

T*cr 


a* = 




(51) 


Next, substituting (51) in (50b) it can be shown that, 


2 - 2 JWb ~ T 2 _ 


-f + ^nw h + 


a 


pX 0 -, A) - T \ X0 ) 2 ] = 0. 


o 


To reach this we have also used the following facts: p(x; X)S:p(x] A) = p(x;A), Ar/(f;l) = p(x] X) and 
E[X 0 ] = 1 by assumption. Multiplying the result with 2 t 2 /cj 2 and defining 

a 

k : = 


\Jf3 2 5 — t 2 ' 

we conclude with, 

/ 3 2 5k 2 — a 2 = E[(p (/3nh + pX o; kX ) — qTfo) 2 ], 

which is same as (47). Also, with respect to the optimal «:* it is easily seen by (51) that 

2 o2 2 a 2 

«* = - cr . 


(52) 


( 53 ) 
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The derivative in (46) with respect to (3 gives 

r\ _ 

—H(a, (3,t) = V6y/cr 2 + a 2 — —E[rj(/3h + —X 0 ;\)h] 
op t a 

=/35k — kE[ r](/3h + -—A )h] = (36k — E [q(K/3h + pXo; A k)H\. (54) 

Kj 

where we have also used (53). Note that the above is same as (48) and recall the constraint 0 < /3 < 1 in (46) to 
conclude with the desired. 

It only remains to show that the solution with respect to k of (47) (eqv. of (52)) is unique when (3 = 1 and 
A > A m ; n . For (3 = 1, (47) is the same as fixed point equation [BM12, Eqn. (1.9)], which in turn was shown to 
admit a unique solution for all A > A m ; n in [DMM11] (see [BM12, Prop. 1.3]). ■ 

B. The Regions of Operation 

We build up to the IEEEproof of Theorem 2.3 through a series of auxiliary lemmas. Through the lemmas, we 
identify two “regimes of operation” of the LASSO. The first, we call Tl^d, and it corresponds to values of A 
for which the optimal (3 is in the open set (0,1). The second regime, is such that /3 = 1. If 5 < 1, we prove in 
Lemma A.5 that there exists a unique critical value A cr j t separating the two regimes in the sense that 'R.\ yM { extends 
from 0 to A cr it. If on the other hand 5 > 1, then there is no T^bad region (Lemma A.6). 

First, we need a few useful definitions. 

Definition A.l: For any A > 0, we let a*(A), r*(A) and /?*(A) be optimal solutions in (46). Apart from a*(A), 
the others are not necessarily unique at this point. Also, ac*(A) is defined as in (49). 

Definition A.l (Bad Regime): We say that a value A > 0 is in the bad regime 72b ac h denote A G 72-bad, if there 
exists /3*(A) G (0,1). 

Definition A.3 (Critical Regime): We say that a value A cr i t > 0 is in the critical regime 'JZ cni , denote A cr i t € 72 cm 
if for some K cnt , the pair A cnt , K cnt solves: 

f k 2 S = cr 2 + E [{r](Kh + fix 0 ; «A) - pX 0 ) 2 ] , (55) 

\ k6 = E [(j](Kh + pXq] k\) ■ h)]. (56) 

As an immediate consequence of the definition above and the first order optimality conditions in Lemma A. 1 , 
we have 

(3* (Acrit) — 1; (Acrit) — ^crit and a* (A cr it) — \J • (57) 

Also, the following lemma reveals the importance of A cr it - all A < A cn[ are in T^bad and the squared error is 
constant in that regime, i.e. a*(A) = a*(A cr it)- 

Lemma A.l (Error in Rbad)■' Let A cr ;t G 7?. cnl . Then, for all 0 < A' < A cr it, it holds A' G 72bad- Furthermore, 
(3*( A ) — A/Acrit? A (A ) — ^critAcrit and cr*(A ) — (A c rit)■ 

Proof: Fix any 0 < A' < A cr it- By definition, there exists K cn t such that A cr it• K cm satisfy (55)-(56). Define 
/ 3' := A/Acrit and r( := K cr n/(3'. It is then easy to see that 13', k' solve (47)-(48) (for A = A' therein). Also, 
(3' < 1 by definition. Thus, A' G Tvlbad and /3*(A') = A/A cr it, ^*(A') = KcritAcrit/A'. Also, using (49) and (57), 
a*(A) = sJ5/3 2 (A')kI(A) - a 2 = s/5k 2 (X crit ) - a 2 = a*(A cr it)- ■ 

It is thus important to identify the critical values of the regularizer parameter, i.e. all A cr it € 72 cri t. Values in 
72-bad are important towards this direction, since as shown in the next lemma, for any A G IZbad there must exist 
some Aait > A. 
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Lemma A.3 (IZbad —>• Knt)- Let Ai € TZbad, then there exists A 2 € 72 cnt with A 2 > Ai. 

Proof: Let f3i,ai,K\ be optimal corresponding to Ai. Since Ai € 77-bad, it holds 0 < 3\ < 1. Then, from 
Lemma A.l, ki,3i solve (47)-(48). Starting from these and substituting A 2 := A] / 3\ and k-> := n\3\ therein, it 
is not hard to see that this is equivalent with \ 2 ,H 2 satisfying (55)-(56). Thus, A 2 € 7^. C rit- Also, clearly A 2 > Ai. 

■ 

The lemma below is important since it shows that when 6 < 1 there exists a unique A cr i t € 77 C rit- 
Lemma A.4 (Unique \ cr it)-' Suppose <5 < 1. The set of equations (55)-(56) has a unique pair of solutions («, A). 
Thus, there exists unique A cr i t € 72 cnl . 

Proof: 

First, we show that there exists at most one A cr i t € 72 cnt . For the shake of contradiction assume two different 
pairs of solutions, say («q, Ai) and («2, A 2 ). By definition, Ai, A2 € TZ cr it- First, note that we cannot have Ai = A2, 

since if this was the case then from (57) we would also have n\ = K 2 . Flenceforth, assume w.l.o.g. that Ai < A 2 . 

It follows from Lemma A.2 that Ai € 77 bad and also /c*(Ai)Ai = k*(A2)A2- Thus, 

k*(A i) < k*(A 2 ). (58) 

But also, again from Lemma A.2, ct*(Ai) = a*(A 2 ). Since, Ai, A 2 G 77 cr i t , this implies when combined with (57) 
that k*(Ai) = k*(A 2 ), which contradicts (58), completing the IEEEproof of this part. 

Let us now prove that 77 cnt is non-empty. To begin with, we show that 'JZ\ 1Lli \ is non-empty in this case. In 
particular, we show that A m ; n defined in Lemma A.l is in 'JZ\ rdi i- Since, <5 < 1, we have A m i n > 0. Suppose that 
( 3 * (A m in) = L k * (A mm )) is optimal for some K*(A rmn ), then, from first-order optimality conditions, k*(A mm ). A mm 
solves (47) for /3 = 1. But, then as in [BM12, pg. 16] K*(A m ; n ) —>■ 00 . Also, since H(a,r,/3) is concave in /3, the 

above imply that ^ | > 0, or equivalently from (54), 

POO 

J h(h - X min )e~ h2/2 dh < 5 

Recalling the definition of A min in Lemma A.l, it can be shown (using standard inequalities on tail functions 
of gaussians) that the inequality above is violated for all 0 < 5 < 1. Hence, it must be /7*(A mm ) < 1. Also, 
/3*(A m in) > 0 because of (51). Thus, A m i n € IZbad- To complete, the IEEEproof use Lemma A.3 with Ai = A m i n 
to see that there exists A 2 G 77 cnt . ■ 

Lemma A.5 (6 < 1): Suppose 6 < 1 and let A cr i t € 77 cr j t . Furthrermore, i) for all A < A cr i t , a* (A) = a:*(A cnt ), 
and, ii) for all A > A cr i t , «*(A) is the unique solution to (47) for 7 = 1. 

Proof: Existence and uniqueness of A cr i t is proved in Lemma A.4 

i) For A < A cr it, the claim follows directly from Lemma A.2. 

ii) Next, we show that for A > A cr i t , there exists an optimal solution for which 7*(A) = 1. This suffices since 

then /c*(A) is indeed solving (47) for 7 = 1 (by first order optimality conditions), and, also, the solution is unique 
by [DMM11], [BM12, Prop. 1.3] and the fact that A m i n < A cr j t < A. To see that /3*(A) = 1, we argue as follows. 
First, /3*(A) 0 (0,1). Otherwise, A € 77 ha d, thus, by Lemma A.3 there exists A' > A > A cr j t such that A' € 77 cnl , 
which contradicts the uniqueness of A cr i t . Hence, 7+ (A) = 1. ■ 

Lemma A.6 (5 > 1): Suppose 6 > 1, then for all A > 0, k*(A) is the unique solution to (47) for 3 = 1. 

Proof: First, let us show that for A —>• 0, the optimal 7 * (A) = 1. Indeed for 3 = 1 an d A —» 00 , (54) gives 

^- = 5- E[(fe + ^ X 0 )h] =5- 1 > 1. 

03 n 
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Thus, from concavity of H with respect to f3, we find that the unique optimal value for d is 

&(A —► 0) = 1. (59) 

Also, as in the IEEEproof of Lemma A.5, /?*(A —>• oo) = 1. Thus, again similar to Lemma A.5, it suffices to prove 
that there exists no A G 72-bad- For the shake of contradiction, suppose that there exists Ai € TZbad- By Lemma A.3, 
there exists Ai < A cl -i t G TZ C nt- But, then /3*(A —>• 0) —>• 0, which contradicts (59). This completes the IEEEproof. 


Proof: (of Theorem 2.3) The claim of the theorem is now a direct consequence of Lemmas A.5 and A .6 
combined with (53). ■ 


Appendix 

Proofs for Section II-D 


A. The LM Algorithm 

The Lloyd-Max algorithm is an algorithm for finding the quantization threshold t* and the representation points 
ii. Given real values x G R sampled from some probability density f(x) it looks for optimal sets t, £ that 
minimizes the mean-square-error (MSE) between x and their corresponding quantized values Q q (x;£, t), i.e. 

(£,t) := argminE x ^[(x - Q q (x\£, t)) 2 ]. (60) 

The algorithm simply alternates between i) optimizing the threshold t % for a given set of £, and then ii) optimizing 
the levels £{ for the new thresholds. It is well known that the converging points £ LM , t LM of the algorithm satisfy 


pLM pLM 

t lm _ _ zi±l 


pLM _ 


4>{x) dx 



x(f){x)(lx 


i = 1,..., L — 1, (61a) 

i = 1,..., L. (61b) 


Furthermore, they are stationary points of the objective function in (60). 

1) Gaussian case: Assume that the values x are sampled from a standard gaussian distribution, i.e. x ~ A r (0.1) 
and cf>(x) = (l/\/2vr) exp(— x 2 /2). Also, recall the definition of the parameters //. o 2 in (3); setting g = Q q therein, 
we find 


q := g(£,t) = / xf{x)dx 

i =l 

r 2 := r 2 (£,t) = f cj)(x)dx 

i= l 


In this notation, the objective in (60) can be writthen as t z — 2/i + 1. Thus, t LM ,t LAI satisfy 


(62a) 

(62b) 


(r 2 )'| 


(<“,t‘«) 


= VI 


^LM t LMy 


(63) 


Here and onwards we use (r 2 )', fi' to denote the gradient of r 2 and g with respect to the vector \£ r , t r ]. The 
gradients are evaluated at the point (i LM ,t LM ) in (63) 
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B. q-Bit Compressive Sensing 


We prove that the LM algorithm is an efficient algorithm when the objective is minimizing the LASSO recon¬ 
struction error of a signal xo to which we have access through g-bit quantized linear measuments Q f/ (a 7 x; £. t). 
It was shown in Section II-D2 that the problem can be posed as that of finding £*,t* such that 


(t*-£*) = argmin 


cr 2 (t, £) 

p 2 (t,£) 


= arg min 
t,£ 


T 2 (t ,£) 

p 2 (t,£)' 


(64) 


The following Lemma proves the claim made in Section II-D3, i.e. the converging point of the LM algorithms 
are stationary points of the objective function in (64). 

Lemma A.l: Then, the converging points of the LM algorithm, say (t LM ,£ LAI ) satisfy 

d_ / r 2 (l,t) 
dii \p 2 (£,t) 
d_ ( r 2 (l, t) 

dti \p 2 (£,t) 

Proof: Call R(t,£) = -—rCTp denote R' := II' (t, £) for its gradient with respect to the vector [t 7 , £ r ]. It 
suffices to prove that R'\^ LM tiM ) = 0 > or equivalently, that at the point (t, £) = (t LM ,£ LM ) the following holds: 


= 0 

(£,t)=(l LM ,t LM ) 

, 7 = 1,- 

;L, 


= 0 , 

(£,t)={£ LM ,t LM ) 

7 = 0,. 

;L~ 1. 

(65) 


(t 2 )' p, 2 = 2 t 2 pp!. 


( 66 ) 


To see that this is the case, note that 

r 2 (t LM ,£ lm ) = p(t LM ,£ lm ) 

This follows by direct substitution of combining (61) in (62). Then, ( 66 ) follows from (67) and (63). 


( 67 ) 









